Skip to content

feat: add ai-cache plugin#13578

Open
janiussyafiq wants to merge 12 commits into
apache:masterfrom
janiussyafiq:feat/ai-cache-exact
Open

feat: add ai-cache plugin#13578
janiussyafiq wants to merge 12 commits into
apache:masterfrom
janiussyafiq:feat/ai-cache-exact

Conversation

@janiussyafiq

@janiussyafiq janiussyafiq commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a new ai-cache plugin that caches LLM responses and replays them for subsequent requests that resolve to the same prompt, cutting upstream token cost and latency for repetitive workloads (FAQ bots, document Q&A, translation).

This PR implements the exact (L1) cache layer:

  • Cache key — a SHA-256 fingerprint of the request as received: client protocol, requested model, normalized messages, and the remaining response-determining body parameters (temperature, top_p, max_tokens, tools, …). Provider-agnostic via ai-protocols, so it works for every chat protocol ai-proxy supports (OpenAI Chat, Anthropic Messages, Bedrock Converse, OpenAI Responses). The key also segments by the selected AI instance — the ai-proxy provider, or the ai-proxy-multi instance picked for the request (recomputed if a retry_on_error fallback answers) — so identical prompts that resolve to instances backed by different models or providers never share an entry.
  • Storage — Redis (single-node); connection fields are sourced from apisix.utils.redis-schema via the policy + if/then convention used by limit-count / limit-req / limit-conn.
  • Scope — per-route by default (cache_key.share_across_routes to share one cache space across routes); opt-in per-consumer / per-variable isolation (cache_key.include_consumer / include_vars).
  • Behavior — write-on-200 only (non-streaming); bypass_on opt-out (exact request-header match); fail_mode (skip / warn / error) when a request did not pass through ai-proxy / ai-proxy-multi; max_cache_body_size cap; X-AI-Cache-Status / X-AI-Cache-Age response headers; fails open (proxies as a normal miss) when Redis is unreachable.
  • Runs below ai-proxy (priority 1035) and depends on ai-proxy / ai-proxy-multi.

Semantic cache, streaming support, and observability are planned as follow-up PRs. User-facing documentation will be added in a later PR once the series is further along.

Which issue(s) this PR fixes:

Related to #13290

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request plugin labels Jun 19, 2026
Comment thread apisix/plugins/ai-cache/key.lua Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ai-cache APISIX plugin that provides an L1 exact-match cache for non-streaming LLM requests handled by ai-proxy, using Redis as the backend and exposing cache debug headers.

Changes:

  • Introduces the ai-cache plugin implementation, schema, and keying logic (SHA-256 fingerprint + configurable scope).
  • Adds an end-to-end test suite covering MISS/HIT, bypassing, TTL expiry, scope isolation, and fail-open behavior.
  • Wires the plugin into the default plugin lists and build/install packaging.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
apisix/plugins/ai-cache.lua Core plugin logic: lookup on access, capture on body/log, Redis integration, cache headers.
apisix/plugins/ai-cache/schema.lua JSON schema for plugin configuration, leveraging apisix.utils.redis-schema via policy + if/then.
apisix/plugins/ai-cache/key.lua Cache key fingerprinting (protocol/model/messages/params) and scope computation.
t/plugin/ai-cache.t New functional + unit tests for cache behavior and edge cases.
t/admin/plugins.t Adds ai-cache to the admin plugin list expectation.
conf/config.yaml.example Adds ai-cache to the example plugin list with priority comment.
apisix/cli/config.lua Adds ai-cache to the CLI’s default plugin list.
Makefile Installs the apisix/plugins/ai-cache/ directory Lua modules during make install.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/plugins/ai-cache/key.lua
Comment thread apisix/plugins/ai-cache.lua
Comment thread apisix/plugins/ai-cache.lua
Comment thread t/plugin/ai-cache.t
Comment thread apisix/plugins/ai-cache.lua
…ss_on

Encode the request fingerprint with rapidjson (sort_keys) plus a
to_rapidjson_value pass that maps the JSON null sentinel and array_mt
tables, mirroring ai-transport/http.lua. core.json.stably_encode (dkjson)
raised on the cjson null sentinel, so a body carrying an explicit null
(e.g. OpenAI's "stop": null) would error out of the access phase.

Replace the cache_bypass var-ref opt-out with bypass_on: an array of
{header, equals} rules that skip the cache when a request header exactly
equals its value (per rfcs#78). Exact header == value only; any matching
rule triggers BYPASS.

Tests: add a null-body fingerprint regression, migrate the bypass tests
to bypass_on, and cover multiple rules where any match bypasses.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comment thread apisix/plugins/ai-cache.lua
Comment thread apisix/plugins/ai-cache/schema.lua
Comment thread t/plugin/ai-cache.t Outdated
Comment thread apisix/plugins/ai-cache.lua
Document the ai-cache plugin: description, full attribute table (incl. all
Redis policy fields), and Admin API / ADC / Ingress Controller examples
covering cache MISS/HIT and bypass_on. Add the page to the en and zh plugin
sidebars.
Comment thread apisix/plugins/ai-cache/key.lua Outdated
Comment thread apisix/plugins/ai-cache/key.lua Outdated
Comment thread apisix/plugins/ai-cache.lua Outdated
Comment thread apisix/plugins/ai-cache.lua
Comment thread apisix/plugins/ai-cache/schema.lua Outdated
nic-6443
nic-6443 previously approved these changes Jun 23, 2026

@nic-6443 nic-6443 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround — all my comments are addressed: per-route scoping by default with share_across_routes opt-out, red:close() on Redis errors instead of pooling a broken connection, the dead layers knob dropped, and the canonical encoding pulled up into core.json.canonical_encode (nicely de-duped with ai-transport). LGTM.

@membphis

Copy link
Copy Markdown
Member

I found two merge-blocking issues in the current ai-cache implementation:

[P1] Cache key does not include the effective model or picked AI instance

ai-cache computes the fingerprint from ctx.var.request_llm_model or body.model, but it does not include ctx.picked_ai_instance_name, provider, or the route / instance effective options.model:

  • apisix/plugins/ai-cache/key.lua: the fingerprint uses only protocol, requested model, normalized messages, and remaining body params.
  • apisix/plugins/ai-cache.lua: the lookup happens in access, before the upstream request is built.
  • ai-proxy-multi has already selected ctx.picked_ai_instance before lower-priority plugins run, so that selected instance is available at cache lookup time.

This can return the wrong provider/model response on an ai-proxy-multi route. A request can warm the cache through instance A, then a later identical request can be routed to instance B but still hit and replay instance A's response because both requests share the same cache key.

This should be fixed before merge by including the selected AI instance and/or effective model/provider in the cache key or scope, with a regression test covering ai-proxy-multi instances that use different models or providers.

[P2] The plugin can cache ordinary JSON traffic when it is not behind ai-proxy

The docs say ai-cache must be used with ai-proxy or ai-proxy-multi, but the implementation does not enforce or safely bypass that condition. ai-cache.access reads any JSON request body, computes a key, and marks the request as MISS; then log writes any 200 response to Redis. There is no ctx.picked_ai_instance guard like the existing AI moderation plugins use.

If the plugin is accidentally attached at Route / Service / Consumer level without an AI proxy, ordinary JSON upstream responses can be cached and replayed. That is a surprising behavior and can leak stale or incorrect non-AI responses.

Please add a guard before key computation, either bypassing by default or using the shared ai-protocols.binding fail_mode behavior, and add coverage for the no-ai-proxy case.

@membphis

Copy link
Copy Markdown
Member

I rechecked the latest update. The no-ai-proxy guard looks addressed now: ai-cache.access checks ctx.picked_ai_instance first, uses the shared ai-protocols.binding fail_mode, and the tests cover both the default bypass behavior and fail_mode=error.

One merge-blocking cache-key issue still remains:

[P1] share_across_routes can still reuse a response across different effective models on plain ai-proxy routes

The new scope includes ctx.picked_ai_instance_name, which fixes the ai-proxy-multi case because different picked instances have different names. However, for the plain ai-proxy plugin, ctx.picked_ai_instance_name is only ai-proxy-<provider> (for example, ai-proxy-openai). It does not include the route-level effective model.

The fingerprint still uses the requested model from ctx.var.request_llm_model or body.model, not the effective model selected by ai-proxy from ai_instance.options.model or request_model. This leaves a real collision when cache sharing is enabled across routes:

  1. Route A uses ai-proxy with provider=openai, options.model=gpt-4o, and ai-cache.cache_key.share_across_routes=true.
  2. Route B uses ai-proxy with provider=openai, options.model=gpt-4o-mini, and the same Redis/cache settings.
  3. The client sends the same body to both routes, especially if the body omits model or carries the same requested model.
  4. Both routes compute the same scope (instance=ai-proxy-openai) and the same fingerprint, so Route B can replay Route A's cached response even though the effective upstream model is different.

This also contradicts the new docs, which say that even with share_across_routes enabled, responses from different upstream models or providers are kept in separate cache entries.

Please include the effective model in the key/scope for the plain ai-proxy case, for example by deriving it from ctx.picked_ai_instance.options.model or body.model before lookup. It would also be good to add a regression test with two plain ai-proxy routes using the same provider, different options.model, the same Redis, and share_across_routes=true; the second route should be a MISS, not a HIT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request plugin size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants